Representation of

Keith Thompson

unread,

May 23, 2021, 10:14:24 PM5/23/21

to

As promised, I've studied what the C standard says about the
requirements for the representation of _Bool. I've referred to the
C11 standard and to drafts of C17 and C2x (N2596). C11 and C17 do
not differ in this area as far as I can tell, but there are some
new things in the C2x proposal.

An object declared as type _Bool is large enough to store the values
0 and 1.

_Bool is an unsigned integer type.

The rank of _Bool shall be less than the rank of all other standard
integer types. This implies that the range of values of _Bool is
a subrange of the range of values of unsigned char. A _Bool object
cannot store a value less than 0 or greater than UCHAR_MAX.

When any scalar value is converted to _Bool, the result is 0 if the
value compares equal to 0; otherwise, the result is 1. This makes
it difficult, but not impossible, to store a value other than 0
or 1 in a _Bool object, but it can be done (or at least attempted)
via type-punning using a union with _Bool and unsigned char members.

C11 footnote: "While the number of bits in a _Bool object is at least
CHAR_BIT, the width (number of sign and value bits) of a _Bool may be
just 1 bit." This acknowledges that _Bool *may* have more than one
value bit, and therefore may represent values other than 0 and 1.
N2596 drops the parenthesized clause (probably because _Bool has
no sign bit).

N2596 adds a macro BOOL_WIDTH to <limits.h>, "width for an object
of type _Bool". It is *at least* 1, implying again that it can
be greater than 1. (I don't see any implementation that defines
BOOL_WIDTH.)

(N2596 also changes the definitions of false and true in <stdbool.h>
so they're of type _Bool rather than int. This doesn't affect
representation.)

Conclusions:

sizeof (_Bool) >= 1. It may be greater than 1, but that would
be weird. If sizeof (_Bool) > 1, then it must have padding bits.

_Bool has no sign bit.

_Bool has *at least* one value bit. It may have more, but no more
than CHAR_BIT of them.

The standard allows some variations in how _Bool is represented.
C programmers would be well advised to avoid writing code for which
this matters.

A conforming implementation may do any of the following (I'll assume
for brevity that CHAR_BIT==8):

* _Bool has 8 value bits. Any value from 0 to 255 inclusive
is valid. Storing a value other than 0 or 1 can be done via
type punning using a union of a _Bool and an unsigned char.

* _Bool has 1 value bit and 7 padding bits, with 254 trap
representations. Using type punning to store a value other than
0 or 1 in a _Bool object, and then accessing that object's value,
results in undefined behavior.

* _Bool has 1 value bit, 7 padding bits, and no trap representations.
Since padding bits by definition do not contribute to the value,
only the value bit's value is relevant. Using type punning to store
a value other than 0 or 1 in a _Bool object gives it a value of 0
if the value is even, 1 if the value is odd.

Other variations are possible (and arguably silly). For example, _Bool
might have 4 value bits and 4 padding bits, or it might be bigger than
1 byte. I expect that kind of thing only on the DeathStation 9000.

Here's a small program that attempts to explore how an implementation
represents objects of type _Bool:

#include <stdio.h>
#include <limits.h>

union U {
_Bool b;
unsigned char rep;
};

int main(void) {
union U obj;
_Bool b;
for (obj.rep = 0; obj.rep <= 3; obj.rep ++) {
printf("obj.b = %d, which is %s, obj.rep = %d",
obj.b, obj.b ? "true " : "false", obj.rep);
b = obj.b;
printf(" ... b = %d, which is %s\n", b, b ? "true " : "false");
}
}

Using gcc 11.1.0, on Ubuntu 20.02 x86_64, I get this output:
obj.b = 0, which is false, obj.rep = 0 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 1 ... b = 1, which is true
obj.b = 2, which is true , obj.rep = 2 ... b = 2, which is true
obj.b = 3, which is true , obj.rep = 3 ... b = 3, which is true

This mostly looks like _Bool has 8 value bits, but if that were the
case, then I *think* that the value of b would always be 0 or 1.
The rules of simple assignment (b = obj.b) specify that the value
of the right operand is converted to the type of the assignment
expression. Converting *any* scalar value to _Bool yields 0 or 1,
even if the value is already of type _Bool. So I conclude that
for gcc, 2 and 3 (and probably anything other than 0 or 1) are
trap representations for _Bool, and that _Bool has 1 value bit,
7 padding bits, and 254 trap representation.

It's possible that the intent is for _Bool to have 8 value bits and the
gcc authors' interpretation of the requirements for simple assignment
differ from mine. (I won't presume to say who's right.)

Using clang 12.0.0 on the same system, I get:
obj.b = 0, which is false, obj.rep = 0 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 1 ... b = 1, which is true
obj.b = 0, which is false, obj.rep = 2 ... b = 0, which is false
obj.b = 1, which is true , obj.rep = 3 ... b = 1, which is true

All bits other than the low-order one are ignored. This is
consistent with _Bool having 1 value bit, 7 padding bits, and no
trap representations. It's also consistent with 2 and 3 being
trap representations, since that would cause undefined behavior.
It's not consistent with _Bool having more than 1 value bit.

When implementers add support for BOOL_WIDTH, they'll have to decide
explicitly how many value bits _Bool has.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Ben Bacarisse

unread,

May 24, 2021, 7:11:29 AM5/24/21

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> Conclusions:
>
> sizeof (_Bool) >= 1. It may be greater than 1, but that would
> be weird. If sizeof (_Bool) > 1, then it must have padding bits.

I don't understand how you draw that last conclusion.

<cut>

> Here's a small program that attempts to explore how an implementation
> represents objects of type _Bool:
>
> #include <stdio.h>
> #include <limits.h>
>
> union U {
> _Bool b;
> unsigned char rep;
> };

When doing this kind of thing, my preference is to write

union U {
_Bool b;
unsigned char rep[sizeof (_Bool)];
};

even when it's very unlikely that the size will be > 1. It makes the
purpose so very clear.

--
Ben.

Richard Damon

unread,

May 24, 2021, 7:43:18 AM5/24/21

to

On 5/24/21 7:11 AM, Ben Bacarisse wrote:
> Keith Thompson <Keith.S.T...@gmail.com> writes:
>
>> Conclusions:
>>
>> sizeof (_Bool) >= 1. It may be greater than 1, but that would
>> be weird. If sizeof (_Bool) > 1, then it must have padding bits.
>
> I don't understand how you draw that last conclusion.

rank(_Bool) < rank(unsigned char) so
max value of _Bool <= UCHAR_MAX so
max number of value bits in _Bool is CHAR_BITS

_Bool has sizeof(_Bool)*CHAR_BIT bits in it, and only CHAR_BIT of them
can be value bits.

if sizeof(_Bool) > 1 there are bits left over that aren't value or sign
(since it doesn't have any, being an unsigned type) bits, so must be
padding bits.

Tim Rentsch

unread,

May 24, 2021, 9:49:33 AM5/24/21

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> As promised, I've studied what the C standard says about the
> requirements for the representation of _Bool. I've referred to the
> C11 standard and to drafts of C17 and C2x (N2596). C11 and C17 do
> not differ in this area as far as I can tell, but there are some
> new things in the C2x proposal.

Thank you, this looks good (and nice to have C17 and C2x included).

> [...]

>
> The rank of _Bool shall be less than the rank of all other standard
> integer types. This implies that the range of values of _Bool is
> a subrange of the range of values of unsigned char. A _Bool object
> cannot store a value less than 0 or greater than UCHAR_MAX.

AFAICT the width of _Bool is permitted to be greater than the
width of an extended unsigned integer type whose width is less
than CHAR_BIT. It seems weird to allow that, but I don't see
anything that forbids it.

> [...]

>
> Here's a small program that attempts to explore how an implementation
> represents objects of type _Bool:
>

> [..program..]
>
> [..gcc results and analysis..] So I conclude that

> for gcc, 2 and 3 (and probably anything other than 0 or 1) are
> trap representations for _Bool, and that _Bool has 1 value bit,
> 7 padding bits, and 254 trap representation.

Yes I believe that's right.

> It's possible that the intent is for _Bool to have 8 value bits and the
> gcc authors' interpretation of the requirements for simple assignment
> differ from mine. (I won't presume to say who's right.)

Other evidence suggests gcc takes the width of _Bool to be 1.
See below.

> [..clang results and analysis..]

> It's not consistent with _Bool having more than 1 value bit.

There could be another value bit that is not adjacent to the low
order bit, with a padding bit inbetween. Of course, it is highly
unlikely that that is the case.

> When implementers add support for BOOL_WIDTH, they'll have to decide
> explicitly how many value bits _Bool has.

I think other parts of the language necessitate the decision
having been made, even without BOOL_WIDTH. Both gcc and
clang take the width of _Bool to be 1, as may be seen by
compiling the following program:

struct {
_Bool just_checking : 2;
} test;

Ben Bacarisse

unread,

May 24, 2021, 12:28:06 PM5/24/21

to

Richard Damon <Ric...@Damon-Family.org> writes:

> On 5/24/21 7:11 AM, Ben Bacarisse wrote:
>> Keith Thompson <Keith.S.T...@gmail.com> writes:
>>
>>> Conclusions:
>>>
>>> sizeof (_Bool) >= 1. It may be greater than 1, but that would
>>> be weird. If sizeof (_Bool) > 1, then it must have padding bits.
>>
>> I don't understand how you draw that last conclusion.
>
> rank(_Bool) < rank(unsigned char) so
> max value of _Bool <= UCHAR_MAX so
> max number of value bits in _Bool is CHAR_BITS

Ah, yes. Thanks.

--
Ben.

jacobnavia

unread,

May 24, 2021, 12:40:19 PM5/24/21

to

In a related issue, I have thought about implementing boolen arrays, where

_Bool tab[8];

sizeof(tab) == 1

I;e. represent boolean arrays as just arrays of 1 bit or bit-arrays.
There are many advantages to bit arrays.

For instance they could replace the usual representation

int32_t flags;

#define FLAG_SOMETHING 1

if (flags & FLAG_SOMETHING)

etc. They would be very compact, making possible to store boolean
values efficiently. The only problem that I saw was:

sizeof(tab[2]) == ????? (0.125?)

so I dropped this idea, I just did not know what I should return for that.

jacob

David Brown

unread,

May 24, 2021, 12:55:34 PM5/24/21

to

Such compact arrays of booleans have their advantages and their
use-cases, but they also have their disadvantages - sometimes they will
be slower, sometimes faster than a normal boolean array.

But one thing that I am confident about, is that it would not be
conforming to standard C - in particular, you couldn't take the address
of "tab[1]" and consider it to point to a normal bool variable.

So this would have to be either an extension to C, or a feature of your
container library. (Or it could be both - you could have an extension,
and your container library could have conditional compilation that spots
your compiler and uses the extension to implement it efficiently, while
falling back to standard C for other compilers.)

Bart

unread,

May 24, 2021, 1:31:43 PM5/24/21

to

On 24/05/2021 17:40, jacobnavia wrote:
> In a related issue, I have thought about implementing boolen arrays, where
>
> _Bool tab[8];
>
> sizeof(tab) == 1
>
> I;e. represent boolean arrays as just arrays of 1 bit or bit-arrays.
> There are many advantages to bit arrays.
>
> For instance they could replace the usual representation
>
>

> etc. They would be very compact, making possible to store boolean
> values efficiently. The only problem that I saw was:
>
> sizeof(tab[2]) == ????? (0.125?)
>
> so I dropped this idea, I just did not know what I should return for that.

sizeof returns a size in bytes, so it would just be rounded up to the
nearest byte:

_Bool tab[50];

sizeof(tab) would be 7 (56 bits) (unless you want to pad it up to 64).
sizeof(tab[3]) would be 1

To work with bits, or to find out the length of the array, you will need
a new operator, such as bitsof():

bitsof(tab) would be 50
bitsof(tab[3]) would be 1

Where it would be troublesome however, is that C works with arrays using
pointers. Then you will need bit-pointers, and it starts getting messy
(I've done it).

> int32_t flags;
>
> #define FLAG_SOMETHING 1
>
> if (flags & FLAG_SOMETHING)

Short bit sequences up to 32 or 64 bits are a different matter. You
don't need bit-arrays and bit-pointers for this. I do it with bit
operations, for example:

uint32_t flags; // signed is not so appropriate

#define BIT_SOMETHING 0 // bit numbers not masks

if (flags.[BIT_SOMETHING]) // yields 0/1, not 0/non-0

But while flags can occupy more than one bit, it gets harder to combine
multiple unrelated flags, like bits 0, 3 and 10, compared with masks.

Keith Thompson

unread,

May 24, 2021, 4:15:23 PM5/24/21

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> Keith Thompson <Keith.S.T...@gmail.com> writes:
>> Conclusions:
>>
>> sizeof (_Bool) >= 1. It may be greater than 1, but that would
>> be weird. If sizeof (_Bool) > 1, then it must have padding bits.
>
> I don't understand how you draw that last conclusion.

[answered elsethread]

> <cut>
>> Here's a small program that attempts to explore how an implementation
>> represents objects of type _Bool:
>>
>> #include <stdio.h>
>> #include <limits.h>
>>
>> union U {
>> _Bool b;
>> unsigned char rep;
>> };
>
> When doing this kind of thing, my preference is to write
>
> union U {
> _Bool b;
> unsigned char rep[sizeof (_Bool)];
> };
>
> even when it's very unlikely that the size will be > 1. It makes the
> purpose so very clear.

Good point.

Keith Thompson

unread,

May 24, 2021, 4:30:49 PM5/24/21

to

Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
> Keith Thompson <Keith.S.T...@gmail.com> writes:
>> As promised, I've studied what the C standard says about the
>> requirements for the representation of _Bool. I've referred to the
>> C11 standard and to drafts of C17 and C2x (N2596). C11 and C17 do
>> not differ in this area as far as I can tell, but there are some
>> new things in the C2x proposal.
>
> Thank you, this looks good (and nice to have C17 and C2x included).
>
>> [...]
>>
>> The rank of _Bool shall be less than the rank of all other standard
>> integer types. This implies that the range of values of _Bool is
>> a subrange of the range of values of unsigned char. A _Bool object
>> cannot store a value less than 0 or greater than UCHAR_MAX.
>
> AFAICT the width of _Bool is permitted to be greater than the
> width of an extended unsigned integer type whose width is less
> than CHAR_BIT. It seems weird to allow that, but I don't see
> anything that forbids it.

For example, an implementation might have an extended integer type
_Nybble with 4 value bits. I'll look into that.

Right, C11 6.7.2.1p4:

The expression that specifies the width of a bit-field shall be an
integer constant expression with a nonnegative value that does not
exceed the width of an object of the type that would be specified
were the colon and expression omitted.

I even cited the footnote on that paragraph, but missed the paragraph
itself.

Both clang and gcc (latest versions) complain that the width of the bit
field exceeds that of its type, so they both have a width of 1 for
_Bool. I'll spend some time later looking into the question of trap
representations.

Philipp Klaus Krause

unread,

May 25, 2021, 5:40:57 AM5/25/21

to

Am 24.05.21 um 22:30 schrieb Keith Thompson:

> I'll spend some time later looking into the question of trap
> representations.
>

AFAIK, for some compilers _Bool has trap representations, i.e. if you
(e.g. via memcpy) put a value other than false or true into a bool, you
get undefined behaviour when reading that value: On architectures where
it is faster, a jumptable might be used instead of a conditional jump
for an if/else construct. Or the code genration for a switch (a + b + c
+ d) will assume that the possible range is 0 to 4 if a to d are bool,
and generate a jump table just covering that range.

Philipp

Vir Campestris

unread,

May 25, 2021, 4:20:37 PM5/25/21

to

On 24/05/2021 12:43, Richard Damon wrote:
> rank(_Bool) < rank(unsigned char) so
> max value of _Bool <= UCHAR_MAX so
> max number of value bits in _Bool is CHAR_BITS
>
> _Bool has sizeof(_Bool)*CHAR_BIT bits in it, and only CHAR_BIT of them
> can be value bits.
>
> if sizeof(_Bool) > 1 there are bits left over that aren't value or sign
> (since it doesn't have any, being an unsigned type) bits, so must be
> padding bits.

Once upon a time I was working with a TI graphics processor which had
arbitrary sized values - very handy if you were doing as we were and had
a 3-bit greyscale display.

The difference between two adjacent _bits_ was 1.
Between two adjacent _bytes_ it was 8.

Single bit booleans make absolute sense on a device like that.

And yes, we were working in C. And some things didn't port.

Andy
--
Under the hood there was a 32-bit bus, and setting 1 bit was a
read-modify-write - but not even visible at assembler level.

Representation of _Bool

Keith Thompson

Ben Bacarisse

Richard Damon

Tim Rentsch

Ben Bacarisse

jacobnavia

David Brown

Bart

Keith Thompson

Keith Thompson

Philipp Klaus Krause

Vir Campestris